AITopics | ethical guideline

Collaborating Authors

ethical guideline

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

54024fca0cef9911be36319e622cde38-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 11:17:26 GMT

With a seed set of manually-identified tactics, we apply GPT -4 to expand the discovery automatically.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Africa > South Africa (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.67)
Research Report > New Finding (0.67)

Industry:

Media (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

54024fca0cef9911be36319e622cde38-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 02:48:03 GMT

attack candidate, harmful query, successful attack, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Africa > South Africa (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.67)
Research Report > New Finding (0.67)

Industry:

Media (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
(9 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Jiang, Houcheng, Zhao, Zetong, Fang, Junfeng, Ma, Haokai, Wang, Ruipeng, Deng, Yang, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceJun-17-2025

Large language models (LLMs) have shown strong performance across natural language tasks, but remain vulnerable to backdoor attacks. Recent model editing-based approaches enable efficient backdoor injection by directly modifying parameters to map specific triggers to attacker-desired responses. However, these methods often suffer from safety fallback, where the model initially responds affirmatively but later reverts to refusals due to safety alignment. In this work, we propose DualEdit, a dual-objective model editing framework that jointly promotes affirmative outputs and suppresses refusal responses. To address two key challenges -- balancing the trade-off between affirmative promotion and refusal suppression, and handling the diversity of refusal expressions -- DualEdit introduces two complementary techniques. (1) Dynamic loss weighting calibrates the objective scale based on the pre-edited model to stabilize optimization. (2) Refusal value anchoring compresses the suppression target space by clustering representative refusal value vectors, reducing optimization conflict from overly diverse token sets. Experiments on safety-aligned LLMs show that DualEdit improves attack success by 9.98\% and reduces safety fallback rate by 10.88\% over baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13285

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction

Bravansky, Michal, Kubon, Vaclav, Hariharan, Suhas, Kirk, Robert

arXiv.org Artificial IntelligenceFeb-24-2025

Interpreting data is central to modern research. Large language models (LLMs) show promise in providing such natural language interpretations of data, yet simple feature extraction methods such as prompting often fail to produce accurate and versatile descriptions for diverse datasets and lack control over granularity and scale. To address these limitations, we propose a domain-agnostic method for dataset featurization that provides precise control over the number of features extracted while maintaining compact and descriptive representations comparable to human expert labeling. Our method optimizes the selection of informative binary features by evaluating the ability of an LLM to reconstruct the original data using those features. We demonstrate its effectiveness in dataset modeling tasks and through two case studies: (1) Constructing a feature representation of jailbreak tactics that compactly captures both the effectiveness and diversity of a larger set of human-crafted attacks; and (2) automating the discovery of features that align with human preferences, achieving accuracy and robustness comparable to expert-crafted features. Moreover, we show that the pipeline scales effectively, improving as additional features are sampled, making it suitable for large and diverse datasets.

arxiv preprint arxiv, dataset featurization, uncovering natural language feature, (11 more...)

arXiv.org Artificial Intelligence

2502.17541

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Delegating Responsibilities to Intelligent Autonomous Systems: Challenges and Benefits

Dodig-Crnkovic, Gordana, Basti, Gianfranco, Holstein, Tobias

arXiv.org Artificial IntelligenceNov-6-2024

As AI systems increasingly operate with autonomy and adaptability, the traditional boundaries of moral responsibility in techno-social systems are being challenged. This paper explores the evolving discourse on the delegation of responsibilities to intelligent autonomous agents and the ethical implications of such practices. Synthesizing recent developments in AI ethics, including concepts of distributed responsibility and ethical AI by design, the paper proposes a functionalist perspective as a framework. This perspective views moral responsibility not as an individual trait but as a role within a socio-technical system, distributed among human and artificial agents. As an example of 'AI ethical by design,' we present Basti and Vitiello's implementation. They suggest that AI can act as artificial moral agents by learning ethical guidelines and using Deontic Higher-Order Logic to assess decisions ethically. Motivated by the possible speed and scale beyond human supervision and ethical implications, the paper argues for 'AI ethical by design', while acknowledging the distributed, shared, and dynamic nature of responsibility. This functionalist approach offers a practical framework for navigating the complexities of AI ethics in a rapidly evolving technological landscape.

ai system, ethics, responsibility, (12 more...)

arXiv.org Artificial Intelligence

2411.15147

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law (0.94)
Government (0.94)
Information Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

Distribution of Responsibility During the Usage of AI-Based Exoskeletons for Upper Limb Rehabilitation

Huaxi, null, Zhang, null, Fontaine, Melanie, Huchard, Marianne, Mereaux, Baptiste, Remy-Neris, Olivier

arXiv.org Artificial IntelligenceOct-22-2024

The ethical issues concerning the AI-based exoskeletons used in healthcare have already been studied literally rather than technically. How the ethical guidelines can be integrated into the development process has not been widely studied. However, this is one of the most important topics which should be studied more in real-life applications. Therefore, in this paper we highlight one ethical concern in the context of an exoskeleton used to train a user to perform a gesture: during the interaction between the exoskeleton, patient and therapist, how is the responsibility for decision making distributed? Based on the outcome of this, we will discuss how to integrate ethical guidelines into the development process of an AI-based exoskeleton. The discussion is based on a case study: AiBle. The different technical factors affecting the rehabilitation results and the human-machine interaction for AI-based exoskeletons are identified and discussed in this paper in order to better apply the ethical guidelines during the development of AI-based exoskeletons.

artificial intelligence, exoskeleton, human computer interaction, (13 more...)

arXiv.org Artificial Intelligence

2410.16887

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France > Brittany > Finistère > Brest (0.04)

Genre: Research Report > Experimental Study (0.90)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Assistive Technologies (1.00)

Add feedback

Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search

Moss, Robert J.

arXiv.org Artificial IntelligenceAug-11-2024

Eliciting harmful behavior from large language models (LLMs) is an important task to ensure the proper alignment and safety of the models. Often when training LLMs, ethical guidelines are followed yet alignment failures may still be uncovered through red teaming adversarial attacks. This work frames the red-teaming problem as a Markov decision process (MDP) and uses Monte Carlo tree search to find harmful behaviors of black-box, closed-source LLMs. We optimize token-level prompt suffixes towards targeted harmful behaviors on white-box LLMs and include a naturalistic loss term, log-perplexity, to generate more natural language attacks for better interpretability. The proposed algorithm, Kov, trains on white-box LLMs to optimize the adversarial attacks and periodically evaluates responses from the black-box LLM to guide the search towards more harmful black-box behaviors. In our preliminary study, results indicate that we can jailbreak black-box models, such as GPT-3.5, in only 10 queries, yet fail on GPT-4$-$which may indicate that newer models are more robust to token-level attacks. All work to reproduce these results is open sourced (https://github.com/sisl/Kov.jl).

adversarial suffix, harmful behavior, suffix, (14 more...)

arXiv.org Artificial Intelligence

2408.08899

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Media (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Zeng, Yi, Sun, Weiyu, Huynh, Tran Ngoc, Song, Dawn, Li, Bo, Jia, Ruoxi

arXiv.org Artificial IntelligenceJun-24-2024

Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations. Experiments show BEEAR reduces the success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0% for instruction-tuning time backdoors targeting malicious code generation, without compromising model utility. Requiring only defender-defined safe and unwanted behaviors, BEEAR represents a step towards practical defenses against safety backdoors in LLMs, providing a foundation for further advancements in AI safety and security.

beear, instruction, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2406.17092

Country:

North America > United States > Virginia (0.04)
Asia > Nepal (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

This is the future of generative AI, according to generative AI

EngadgetDec-28-2023, 16:00:10 GMT

The prompt asked for a 1,200 word article (a number it undercut by quite a margin) that explored both the potential negative and positive outcomes of the technology. We then asked it to include real world examples, which is apparently beyond its capabilities. We also asked it to include a section on the recent Sam Altman debacle which, as you will soon read, was also not a topic it was particularly capable at describing. Below is the unedited output with light changes for formatting. Generative Artificial Intelligence (AI) has emerged as a powerful force, reshaping the technological landscape with its ability to create content autonomously.

ai development, ethical consideration, generative ai, (13 more...)

Engadget

Industry:

Information Technology > Security & Privacy (0.98)
Health & Medicine (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Regulating AI: 3 experts explain why it's difficult to do and important to get right

#artificialintelligenceApr-4-2023, 01:26:21 GMT

From fake photos of Donald Trump being arrested by New York City police officers to a chatbot describing a very-much-alive computer scientist as having died tragically, the ability of the new generation of generative artificial intelligence systems to create convincing but fictional text and images is setting off alarms about fraud and misinformation on steroids. Indeed, a group of artificial intelligence researchers and industry figures urged the industry on March 29, 2023, to pause further training of the latest AI technologies or, barring that, for governments to "impose a moratorium." These technologies – image generators like DALL-E, Midjourney and Stable Diffusion, and text generators like Bard, ChatGPT, Chinchilla and LLaMA – are now available to millions of people and don't require technical knowledge to use. Given the potential for widespread harm as technology companies roll out these AI systems and test them on the public, policymakers are faced with the task of determining whether and how to regulate the emerging technology. The Conversation asked three experts on technology policy to explain why regulating AI is such a challenge – and why it's so important to get it right.

ai system, regulating ai, regulation, (13 more...)

#artificialintelligence

Country:

North America > United States > New York (0.25)
North America > United States > California > Los Angeles County > Los Angeles (0.15)
North America > United States > Texas (0.05)

Industry:

Law > Statutes (0.96)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.89)

Add feedback